Thesaurus as a complex network
نویسندگان
چکیده
A thesaurus is one, out of many, possible representations of term (or word) connectivity. The terms of a thesaurus are seen as the nodes and their relationship as the links of a directed graph. The directionality of the links retains all the thesaurus information and allows the measurement of several quantities. This has lead to a new term classification according to the characteristics of the nodes, for example, nodes with no links in, no links out, etc. Using an electronic available thesaurus we have obtained the incoming and outgoing link distributions. While the incoming link distribution follows a stretched exponential function, the lower bound for the outgoing link distribution has the same envelope of the scientific paper citation distribution proposed by Albuquerque and Tsallis [1]. However, a better fit is obtained by simpler function which is the solution of Ricatti’s differential equation. We conjecture that this differential equation is the continuous limit of a stochastic growth model of the thesaurus network. We also propose a new manner to arrange a thesaurus using the “inversion method”.
منابع مشابه
امکانسنجی طرح تدوین اصطلاح نامۀ مطالعات زنان و خانواده براساس استاندارد BS ISO 25964-1
Research Objective: Feasibility study of the Family and Women’s Studies Thesaurus considering the expansion of information in the field of women and family studies, as well as the wide span of related vocabulary and the development of vocabulary lists and bibliographies, the Family and Women’s Studies Thesaurus can be a professional tool for indexing and retrieval of women’s information in data...
متن کاملارائه روشی جدید برای شاخصگذاری خودکار و استخراج کلمات کلیدی برای بازیابی اطلاعات و خوشهبندی متون
Persian words in writing with a diverse and cover all modes of grammatical words with the recruitment of a series of specific rules because it is impossible to extract keywords automatically from Persian texts difficult and complex. This thesis has attempted to use linguistic information and thesaurus, keywords Mnatry be provided. Using the symbol system is structured network can be keywords, i...
متن کاملLobby index as a network centrality measure
We study the lobby index ( l for short) as a local node centrality measure for complex networks. The l is compared with degree (a local measure), betweenness and Eigenvector centralities (two global measures) in the case of a biological network (Yeast interaction protein-protein network) and a linguistic network (Moby Thesaurus II ). In both networks, the l has poor correlation with betweenness...
متن کاملبررسی وضعیت نرمافزارهای مدیریت و ارائهی اصطلاحنامهای فارسی
The current study is devoted to investigate softwares for managing and providing Persian thesaurus. Therefore, using survey-descriptive method, we have analyzed five thesaurus management softwares, including the softwares “Islamic Sciences Thesaurus”, “Thesaurus Builder”, “Pars Azarakhsh”, “Ghamoos” and “published version of Ebrahimpoor Thesaurus”, along with four softwares for providing thesau...
متن کاملBasic word statistics for information retrieval: thesaurus as a complex network
Words are the building blocks to construct sentences and to transmit information. Here, two distinctive hard classification approaches are applied to words. First, we consider words as being the nodes and their relationships as being the links of a directed graph. This permits us define, in a natural manner, the thesaurus conformation. The statistics of the outcoming and incoming links are char...
متن کامل